Dynamic Language Models for Streaming Text
نویسندگان
چکیده
منابع مشابه
Dynamic Language Models for Streaming Text
We present a probabilistic language model that captures temporal dynamics and conditions on arbitrary non-linguistic context features. These context features serve as important indicators of language changes that are otherwise difficult to capture using text data by itself. We learn our model in an efficient online fashion that is scalable for large, streaming data. With five streaming datasets...
متن کاملExponential Reservoir Sampling for Streaming Language Models
We show how rapidly changing textual streams such as Twitter can be modelled in fixed space. Our approach is based upon a randomised algorithm called Exponential Reservoir Sampling, unexplored by this community until now. Using language models over Twitter and Newswire as a testbed, our experimental results based on perplexity support the intuition that recently observed data generally outweigh...
متن کاملVisualizing Streaming Text Data with Dynamic Maps
The many endless rivers of text now available present a serious challenge in the task of gleaning, analyzing and discovering useful information. In this paper, we describe a methodology for visualizing text streams in real time. The approach automatically groups similar messages into “countries,” with keyword summaries, using semantic analysis, graph clustering and map generation techniques. It...
متن کاملFine-tuned Language Models for Text Classification
Transfer learning has revolutionized computer vision, but existing approaches in NLP still require task-specific modifications and training from scratch. We propose Fine-tuned Language Models (FitLaM), an effective transfer learning method that can be applied to any task in NLP, and introduce techniques that are key for fine-tuning a state-of-the-art language model. Our method significantly out...
متن کاملUsing Language Models for Text Classification
This paper describes an approach to text classification using language models. This approach is a natural extension of the traditional Naïve Bayes classifier, in which we replace the Laplace smoothing by some more sophisticated smoothing methods. In this paper, we tested four smoothing methods commonly used in information retrieval. Our experimental results show that using a language model, we ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Transactions of the Association for Computational Linguistics
سال: 2014
ISSN: 2307-387X
DOI: 10.1162/tacl_a_00175